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ABSTRACT The Middle East respiratory syndrome coronavirus (MERS-CoV) was first documented in the Kingdom of Saudi Ara- 
bia (KSA) in 2012 and, to date, has been identified in 180 cases with 43% mortality. In this study, we have determined the MERS- 
CoV evolutionary rate, documented genetic variants of the virus and their distribution throughout the Arabian peninsula, and 
identified the genome positions under positive selection, important features for monitoring adaptation of MERS-CoV to human 
transmission and for identifying the source of infections. Respiratory samples from confirmed KSA MERS cases from May to 
September 2013 were subjected to whole-genome deep sequencing, and 32 complete or partial sequences (20 were >99% com- 
plete, 7 were 50 to 94% complete, and 5 were 27 to 50% complete) were obtained, bringing the total available MERS-CoV 
genomic sequences to 65. An evolutionary rate of 1.12 X 10 -3 substitutions per site per year (95% credible interval [95% CI], 
8.76 X 10 -4 ; 1.37 X 10~ 3 ) was estimated, bringing the time to most recent common ancestor to March 2012 (95% CI, December 
201 1; June 2012). Only one MERS-CoV codon, spike 1020, located in a domain required for cell entry, is under strong positive 
selection. Four KSA MERS-CoV phylogenetic clades were found, with 3 clades apparently no longer contributing to current 
cases. The size of the population infected with MERS-CoV showed a gradual increase to June 2013, followed by a decline, possi- 
bly due to increased surveillance and infection control measures combined with a basic reproduction number (R 0 ) for the virus 
that is less than 1. 

IMPORTANCE MERS-CoV adaptation toward higher rates of sustained human-to-human transmission appears not to have oc- 
curred yet. While MERS-CoV transmission currently appears weak, careful monitoring of changes in MERS-CoV genomes and 
of the MERS epidemic should be maintained. The observation of phylogenetically related MERS-CoV in geographically diverse 
locations must be taken into account in efforts to identify the animal source and transmission of the virus. 
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The Middle East Respiratory Syndrome Coronavirus (MERS- 
CoV) was first detected in the Kingdom of Saudi Arabia (KSA) 
in 2012 (1-4), and to date, infection with the virus has been iden- 
tified in 180 patients with 43% mortality (5). Previously, the SARS 
coronavirus emerged from an animal reservoir (6), and a zoonotic 
event may also provide the source of MERS-CoV; however, no 
consistent pattern of animal exposure has been observed with 
MERS cases. Serological studies have identified a high prevalence 
of MERS-CoV reactive antibodies in camels in Oman, the Canary 
Islands, and Egypt (7, 8), and fragments of MERS-CoV sequence 
have been reported from bats (9) and camels (10). However, to 



date, MERS-CoV itself has not been isolated from any nonhuman 
source. If such an animal reservoir exists, MERS-CoV epidemiol- 
ogy could be explained by intermittent animal-to-human trans- 
mission seeding clusters of human-to-human transmission, but 
with a reproduction number (R 0 ) of less than 1 (11, 12), these 
clusters eventually disappear. An alternative hypothesis is that the 
virus has now infected a sufficient number of humans to account 
for the observed distribution and diversity of the virus but the 
infection is asymptomatic in many individuals. A recent serosur- 
vey of 363 individuals in the Saudi Arabia failed, however, to find 
MERS-CoV-seropositive individuals (13). 
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FIG 1 Bayesian-inferred phylogeny of the 32 new MERS-CoV sequences combined with the 33 previously available genomes (EMC/2012 [JX869059], 
Jordan_N3 [KC776174], Munich_AbuDhabi_2013 [KF192507], England-Qatar_2012 [KC667074], Al-Hasa_l_2013 [KF186567], Al-Hasa_2_2013 
[KF186566], Al-Hasa_3_2013 [KF186565], Al-Hasa_4_2013 [KF186564, plus all previously published MERS-CoV sequences 17], England2-HPA [http:// 
www.hpa.org.uk/Topics/InfectiousDiseases/InfectionsAZ/MERSCoV/respPartialgeneticsequenceofnovelcoronavirus/], France_UAE_2013 [KF745068], 
Qatar_3_2013 [KF961221], and Qatar_4_2013 [KF961222]). All new genome sequences from this study are labeled in red. Clades are marked with vertical bars 
on the right and (with the exception of clade A and the Al-Hasa clade) named by the initial genome in the clade. The scale bar indicates the genetic distance, in 
substitutions per site, from the arbitrary midpoint root. Bayesian posterior probabilities for each clade are listed above the relevant node. 



A detailed description of MERS-CoV evolution is useful to 
assess public health risks, to help identify the source of new infec- 
tions, and to detect viral adaptation to human transmission. In 
this report, we advance our knowledge of the MERS-CoV out- 
break with complete or partial MERS-CoV genome sequences ob- 
tained directly from 32 recent MERS patient samples from cases 
between July and September 2013, bringing the total available 
MERS-CoV genomic sequences to 65 (37% of the 178 MERS cases 
reported globally). 

RESULTS 

Phylogenetic analysis. All PCR-confirmed MERS case samples 
from Saudi Arabia were processed for whole-genome deep se- 



quencing (14, 15), adding 32 new MERS-CoV genome sequences 
to the publically available data set. The phylogenetic relationship 
of all MERS-CoV genomes was inferred from the 33 previously 
published genomes (2, 10, 14-16) and 32 new sequences (Fig. 1). 
The previously described Al-Hasa clade (17) has expanded, with 6 
new members. The Riyadh_3 clade, which includes virus from a 
Qatari patient diagnosed in London (15) and a United Arab Emir- 
ates patient diagnosed in Munich (16) (Fig. 1), has increased to 9 
members since the previous report (17) and includes new viruses 
from Riyadh, Wadi-Ad-Dawasir, and Ta'if. The Buraidah_l vari- 
ant (Fig. 1), first observed with Buraidah_l_2013, has now ex- 
panded to include a virus from Ta'if, two viruses from Khamis 
Mushait in the southern province of Asir, and the UAE_Dubai 
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FIG 2 Time-resolved phylogenetic tree of all concatenated coding regions of the 42 phylogenetically distinct MERS-CoV genomes (see Materials and Methods 
for further details). A discrete traits model implemented in BEAST version 1.7.5 (36) was used to determine the most probable geographical location for each 
branch; a change in branch color indicates a geographical location change during its evolutionary history. Posterior probabilities for the inferred geographical 
locations are indicated at the nodes, an asterisk at a node indicates a posterior probability of >0.9 for that clade, and time is indicated on the x axis. 



_France_patient_l virus identified in a United Arab Emirates 
(UAE) patient in Valenciennes, France. 

Most of the new genomes cluster with the previous singleton 
Hafr-Al-Batin_l_2013 genome, which appeared in the northeast 
of Saudi Arabia on 4 June 2013 (Fig. 1). The later Hafr-AI-Batin 
cases include a family cluster of MERS cases. Sequences were ob- 
tained from three contacts of the index case (Hafr-Al-Batin_4, 
Hafr-Al-Batin_5, and Hafr-Al-Batin_6) and a contact of Hafr-Al- 
Batin_6 (Hafr-Al-Batin_2). The four contact sequences cluster 
together. The close similarity between the Hafr-AI-Batin clade vi- 
ruses and Riyadh_12_2013 (Fig. 1) indicates a possible link be- 
tween these cases that has not been revealed epidemiologically. 
Viruses from recent Madinah cases (Madinah_l_2013 and 
Madinah_3_2013) and three Riyadh viruses (Riyadh_13-2013, 
Riyadh_14_2013, and Riyadh_15_2013) cluster closely. In addi- 



tion, two virus genomes from Qatar MERS patients in October 
2013 ( 10) also cluster in the Hafr-Al-Batin_l clade. No additional 
genomes were found in clade A or in the Bisha_l/Riyadh_l clade. 

A time-resolved phylogeny was generated from all epidemio- 
logically unlinked viruses with genome coverage of >30%. The 
geographical locations of the ancestral viruses were coestimated 
and marked by color coding in the phylogenetic tree (Fig. 2), lead- 
ing to a prediction that the ancestors of most of the viral clades 
originated in Riyadh. 

Evolutionary rate. A critical feature of an emerging virus is 
how quickly it is changing. The evolutionary rate for the updated 
set of 42 epidemiologically unlinked MERS-CoV genomes was 
estimated as 1.12 X 10~ 3 substitutions per site per year (95% 
credible interval [95% CI], 8.76 X 10" 4 ; 1.37 X 10" 3 ), bringing 
the time to most recent common ancestor (tMRCA) for clade B 
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FIG 3 Bayesian skyline plot (BSP) showing the changes in effective population size of MERS-CoV across time. The dashed black line indicates the median 
population size estimated from the BMCMC used in the inference of the time-resolved phylogeny (see Fig. 2 and Materials and Methods). The gray shading 
indicates the 95% highest posterior density of the estimated population size. 



(all MERS-CoV except clade A) to March 2012 (95% CI, Decem- 
ber 201 1; June 2012). This is within the credible interval bounds of 
the previous estimation (17). Two codon positions in the MERS- 
CoV genome exhibit evidence of episodic selection using mixed 
effects model of evolution (MEME; see Materials and Methods), 
spike codon 1020 (P = 0.014) and, more weakly, spike codon 158 
(P = 0.059). Furthermore, under an alternative selection analysis 
method (fast unconstrained Bayesian approximation [FUBAR]), 
spike codon 509 is suggested to be under positive selection. 

Population size. An estimation of the relative change in the 
population size of MERS-CoV over time was made from the 
Gaussian Markov random field (GMRF) Bayesian Skyride coales- 
cent model (18), employed to infer the time-resolved phylogeny. 
The Bayesian skyline plot (BSP) (Fig. 3) shows that after the first 
documented MERS case in June 2012, the relative MERS-CoV 
population size (i.e., the relative number of infections) increased 
gradually, reaching a plateau at around April 2013. Since then, the 
effective viral population size has decreased, reflecting the appar- 
ent disappearance of multiple lineages (Riyadh_3, Buraidah_l, 
and Al-Hasa) (Fig. 4A). Plotting genomes by clade and sample 
time (Fig. 4A) shows that the viral clades appear limited in time, 
although we note a long time interval between the beginning and 
end of the Riyadh_3 cluster, suggesting the existence of unde- 
tected cases. Under the assumption of limited missing cases, the 
average time of existence (last observed date to first observed date) 
(Fig. 4A, see the legend) is 98 days for the four clades, although the 
last variant, Hafr-Al-Batin_l, was still in circulation at the end of 



the observation period. All 9 of the recently identified viruses from 
Riyadh are from the Hafr-Al-Batin_l clade, and no further 
Riyadh_3 variants have appeared in Riyadh. 

Geography. The variations of MERS-CoV genome sequences 
combined with sample collection dates and locations can help 
identify the source of new MERS-CoV infections. Four MERS- 
CoV monophyletic lineages containing 4 or more cases and per- 
sisting for 2 months or more have been detected (Al-Hasa, Ri- 
yadh_3, Buraidah_l, and Hafr-Al-Batin_l), and there are 6 
sporadic viruses from Bisha, Riyadh, Makkah, and Al Zarqa, Jor- 
dan (Fig. 1 and 4B). The geographical locations of all available 
MERS-CoV genomic sequences, labeled by clade, and the sporadic 
viruses (clade size, <4) are plotted in Fig. 4B. The Al-Hasa variants 
(Fig. 4B, gray circles) were not detected in any other part of Saudi 
Arabia, and the Al-Hasa region has remained free of other virus 
variants, indicating that the Al-Hasa virus source was constrained 
to the Al-Hasa region. The more recently emerged Hafr-Al- 
Batin_l variant (Fig. 4B, green circles), is now found in three KSA 
locations (Riyadh, Hafr-Al-Batin, and Madinah), as well as in Qa- 
tar. Riyadh_3 viruses (Fig. 4B, orange circles) are geographically 
dispersed and were found in Riyadh, Wadi Ad-Dawasir, and Ta'if 
in Saudi Arabia, as well as Qatar/London (15) and Abu Dhabi/ 
Munich (16). The Buraidah_l clade (Fig. 4B, blue circles) has 
appeared in Buraidah, Ta'if, Musayt (in the southern province of 
Asir), and in a patient from Dubai, United Arab Emirates, in Va- 
lenciennes, France (19). The geographical dispersion of MERS- 
CoV lineages suggests a mobile infection source, either as human - 
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FIG 4 Distribution of MERS-CoV clades in time and space. (A) All available MERS-CoV genomes were stratified by phylogenetic clade (see Fig. 1 ) and plotted by virus 
sample date. The length of each clade was determined as the difference in days between the first and last observed sample of that virus and yielded the following values: 
Al-Hasa (21 April 2013 to 22 June 2013; 62 days), Riyadh_3 (5 February 2013 to 2 July 2013; 147 days), Buraidah_l viruses (3 May 2013 to 5 August 2013; 84 days), and 
Hafr-Al-Batin_l (4 June 2013 to 01 October 2013; 119 days). (B) AH available MERS-CoV genomes were stratified by phylogenetic clade (see Fig. 1) and plotted by the 
case location. Cities are indicated by small black circles, and sequenced viruses by larger circles colored according to phylogenetic clade. 



to-human or nonhuman-to-human infections or via transported 
animal product. 

Protein changes in MERS-CoV. The coding regions of the vi- 
ral genome are evolving at an average rate of 1.12 X 10~ 3 substi- 
tutions per site per year. Substitutions can be nonuniformly dis- 



tributed, with coding regions constrained by protein function and 
regions exposed to host innate or adaptive immune responses 
showing greater levels of substitution. It is important to monitor 
MERS-CoV amino acid substitutions that could signal adaptation 
to human transmission, especially in proteins at the virus-host 
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FIG 5 Substitutions in MERS-CoV spike proteins. All available KSA MERS-CoV spike ORFs were translated, the proteins aligned, and amino acid differences 
from the reconstructed ancestral clade B protein determined; changes observed in more than one genome are marked by vertical colored bars, with the new amino 
acid residue coded as shown at the bottom. Gray bars indicate a gap in sequence coverage. Functional domains of the spike (S) protein are marked and include 
the N-terminal domain, the receptor binding domain, the fusion domain (Fusion), heptad repeats 1 and 2 (HR1 and HR2) (20, 42), the transmembrane (TM) 
domain, and the cytoplasmic (Endo) domain (43). 



interface. Changes in all MERS-CoV spike proteins are shown in 
Fig. 5. Positive selection analysis using the MEME method re- 
vealed that spike codon 1020 is under episodic selection, and using 
the FUBAR method, codon 509 is suggested to be under modest 
positive selection (see Materials and Methods). The codon 1020 
substitution is in heptad repeat 1 (HR1) of the spike protein (Fig. 
5; see also Fig. SI in the supplemental material, right panel), which 
may influence the membrane fusion activity of the spike protein 
(20). MERS-CoV genomes in the Al-Hasa and Hafr-Al-Batin_l 
clades encode an arginine at this position, while the Riyadh_3 
clade genome encodes a histidine. Nine genomes show amino acid 
substitutions in the receptor-binding domain (RBD) of the spike 
protein, including a recent genome, Riyadh_9, which has two 
amino substitutions in the RBD (Fig. 5), and the two recent Qatar 
genomes. Using a reported crystal structure of the human coro- 
navirus Erasmus Medical Center/2012 (EMC/2012) RBD in com- 
plex with the human receptor dipeptidyl peptidase 4 (DDP4) (21) 
complex (Protein Data Bank [PDB] ID 4L72), nonsynonymous 
mutations are observed in buried spike protein residues 482, 506, 
and 534; all are conservative changes in terms of their amino acid 
properties. A change of aspartic acid to glycine at codon 509 was 



observed in the Riyadh_l and Bisha_l genomes, and this position 
was found to be under modest positive selection. This residue is 
not part of, but is immediately adjacent to, the spike-DPP4 bind- 
ing interface (Fig. SI, left). However, none of the changes in the 
RBD have been observed in multiple genomes, suggesting limited 
transmission. Five amino acid substitutions persist in multiple 
viruses (D158Y, Q1020R or Q1020H, T1202I, Q1208H, and 
S460F) (Fig. 5), suggesting a neutral or positive consequence of 
the variant for the virus. These include a Hafr-Al-Batin clade vari- 
ant with both D158Y and Q1020R. These combined changes first 
appeared in Riyadh_8 and Hafr-Al-Batin_l and are also present in 
the later viruses Hafr-Al-Batin_2, 5, and 6, Riyadh_10, 11, 12, and 
17, and Madinah_3. The S460F change in two recent Qatar ge- 
nomes is close to the spike-DPP4 binding interface. None of these 
changes reach significance when examined by all positive selection 
algorithms. 

DISCUSSION 

The study reported here significantly extends our previous report 
on 21 MERS-CoV genomes and the observation of three geneti- 
cally distinct lineages of MERS-CoV circulating in Riyadh. We 
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concluded previously that it was unlikely that the Riyadh infec- 
tions were the result of a single continuous human-to-human 
transmission chain (17) and suggested that transmission within 
Saudi Arabia was consistent with either movement of an animal 
reservoir or animal products or movement of infected humans. 
We now present additional data from 32 new MERS-CoV ge- 
nomes which show that 4 phylogenetic clades of viruses have been 
observed and 3 of these clades were no longer detected in cases at 
the end of the current observation period. This pattern of clade 
disappearance maybe due to the increased MERS surveillance and 
patient isolation that was implemented during the course of the 
outbreak (14), combined with an _R 0 of less than 1 (11, 12), but it 
could also reflect undiagnosed asymptomatic spread, and we note 
the extended pattern of the Riyadh_3 cluster. 

Adaptation of a zoonotic virus to a new host often requires 
sustained replication of the virus in the new host for the selection 
of amino acid changes that favor transmission. We find only lim- 
ited evidence of adaptation to human transmission in the form of 
positively selected amino acids in MERS-CoV lineages. However, 
none of the MERS-CoV clades have been observed to persist be- 
yond 2 to 3 months, and thus, sustained human transmission may 
not have occurred yet with MERS-CoV, although with the most 
recent MERS-CoV Hafr-Al-Batin_l variant, mortality has been 
observed in two young healthy patients. It is essential that careful 
monitoring of virus lineages and genome changes in the epidemic 
is maintained and that the functional consequences of these sub- 
stitutions in the spike and other viral proteins should be exam- 
ined. 

The spike amino acid changes to either arginine in the Hafr- 
Al-Batin clade or histidine in the Riyadh_3 clade codon 1020 are 
not predicted to change the alpha helical structure of this region 
(Fig. SI, right); however, the histidine provides an endosomal 
protonated residue and the arginine provides a potential endo- 
somal protease cleavage site; either of these changes might alter the 
fusion function of this motif. The combination of HR1 with hep- 
tad repeat 2 (HR2) and the fusion domain are essential compo- 
nents of the fusion mechanism of the coronavirus spike protein 
and allow passage of the virus across the endosomal membrane 
(20). Changes in HR1 are associated with host range expansion of 
murine hepatitis virus (22). The external orientation of the spike 
protein may expose it to immune selection, and such changes are 
important information when designing reagents for serological 
testing. Changes in the coronavirus spike have been reported to 
accompany coronavirus host switches (22) and the SARS corona- 
virus adaptation to humans (23-25), and such changes should be 
monitored for their effects on the receptor binding and transmis- 
sion properties of the virus. In particular, spike changes of D158Y, 
D509G, Q1020R Q1020H, T1202I, and Q1208H should be tested 
for altered biological properties. 

The MERS-CoV-encoded enzymes are obvious targets for an- 
tiviral drugs, and screening efforts should use viral enzymes rep- 
resentative of the currently circulating forms of the virus. The 
major 3C protease, required for multiple cleavages of the replicase 
polyproteins, shows a high level of conservation, with only three 
nonsynonymous changes observed across all known MERS-CoV. 
The viral papain-like protease (PLP) is required for cleavage of the 
open reading frame 1A (ORFla) polyprotein and may antagonize 
host immune signaling (26, 27). The Al-Hasa lineage shows a sus- 
tained A160S substitution in PLP, while the later viruses in the 
Hafr-Al-Batin lineage also have an R911C substitution in PLP, 



close to the catalytic CHD triad. In addition, position 90 shows 
substitutions in Jordan_N3_2012 (K90G) and Wadi-Ad- 
Dawasir_l, Taif_l_2013, and Taif_4_2013 (K90E), and the 
changes may be relevant for enzyme activity. The viral ADP- 
ribose-l"-monophosphatase (ADRP) has two conserved domains 
required for activity: VNAAN at positions 290 to 294 and GIF at 
384 to 386. Wadi-Ad-Dawasir_l virus shows a change to VNAVN, 
and a number of sustained amino acid substitutions have oc- 
curred in the amino half of ADRP. 

Considerable effort has been made to determine an animal 
source for MERS-CoV. To date, serological evidence for a cross- 
reactive virus in camels has been reported (7, 8), and a small frag- 
ment of MERS-CoV sequence has been identified in a bat from 
Saudi Arabia (9). Recently, a camel in contact with a case in Saudi 
Arabia tested positive for MERS-CoV by PCR (28); however, mul- 
tiple attempts at deep sequencing failed to yield convincing 
MERS-CoV sequences from the 2 camel nasal samples, despite the 
availability of a complete genome obtained from the patient 
(M. Cotten, S. J. Watson, P. Kellam, H. Q. Makhdoom, 
Z. A. Memish, unpublished results). More recently, 5 fragments of 
sequence were obtained from a camel cared for by a MERS patient 
in Qatar (10); these fragments were phylogenetically related to 
whole-genome sequences of MERS-CoV from two patients in 
contact with the camel (Qatar_3_2013 and Qatar_4_2013) 
(Fig. 1), providing support for MERS-CoV infection in camels 
and suggesting camels as an animal reservoir for the virus. Zoo- 
notic movements from an animal reservoir to humans have oc- 
curred with the SARS coronavirus (6, 25, 29). It is unclear to what 
extent "chatter" occurs between such an animal reservoir and hu- 
mans before a purely human infection becomes sustained. The 
strongest argument for a persistent animal reservoir may be that 
the occurrence of MERS-CoV infections in multiple sites in Saudi 
Arabia, as well as in Jordan, Qatar, and United Arab Emirates 
(Dubai and Abu Dhabi), is unlikely to be sustained by the ob- 
served limited human-to-human MERS-CoV transmission, and 
thus, a more widespread population of MERS-CoV in animals 
could exist. However, the pattern of MERS-CoV lineages we have 
documented here is not consistent with a uniform gradient of 
MERS-CoV evolution across the Arabian peninsula. Instead, it is 
more consistent with the movement of infected livestock or ani- 
mal products. This conclusion is suggested by the appearance of 
the Hafr-Al-Batin_l lineage in Riyadh, Hafr-al-Batin, Madinah, 
and Qatar or the Riyadh_3 lineage in Riyadh, Wadi Ad-Dawasir, 
Ta'if, Qatar, and United Arab Emirates. The appearance of phylo- 
genetically related MERS-CoV in geographically distant locations 
must be taken into account in efforts to identify the animal source 
and transmission of the virus. 

We have estimated the time of the most recent common ances- 
tor (tMRCA) as March 2012, consistent with the initial case de- 
tection. It should be noted that the tMRCA only estimates when 
the currently circulating viruses were last in a single host; it does 
not tell us what that host was. Although we only have viral se- 
quences isolated from human patients, it is plausible that this virus 
was in an as-yet-unidentified animal reservoir. The fact that we 
only have viruses isolated from human cases (and one camel 
linked to a human case) may simply represent a strong ascertain- 
ment bias toward severe human disease. 

In conclusion, the rapid identification and isolation of cases, 
combined with an _R 0 of less than 1, may control the human-to- 
human transmission as long as the virus transmission properties 
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TABLE 1 Summary of all MERS-CoV genomic sequences used in this study 





Sample collection 


Genome 




Genome 


date 


fraction" 


GenBank accession number or source 


Jordan N3 2012 


15 April 2012 


1 


KC776174 


Bisha 1 2012 


19 June 2012 


1 


KF600620 


England-Qatar 2012 


19 September 2012 


1 


KC667074 


Riyadh 1 2012 


23 October 2012 


1 


KF600612 


Riyadh_2_2012 


30 October 2012 


1 


KF600652 


Riyadh_3_2013 


5 February 2013 


1 


KF600613 


England2-HPA_2013 


10 February 2013 


1 


http://www.hpa.org.uk/Topics/InfectiousDiseases/InfectionsAZ/MERSCoV 








/respPartialgeneticsecjuenceofnovelcoronavirus/ 


Riyadh 4 2013 


1 March 2013 


1 


KJ 156952 


Munich AbuDhabi 2013 


22 March 2013 


1 


KF192507 


Al Hasa 2 2013 


21 April 2013 


1 


KF 186566 


Al Hasa 3 2013 


22 Apr 2013 


1 


KF186565 


Al-Hasa 24 2013 


1 May 2013 


0.41 


KJ156867, KJ156919, KJ156875, KJ156885, KJ156870, KJ156892, 








KJ 156902 


Al Hasa 4 2013 


1 May 2013 


1 


KF 186564 


Al-Hasa 7 2013 


1 May 2013 


0.93 


KF600623, KF600655 


Al-Hasa 8 2013 


1 May 2013 


0.74 


KF600618, KF600626, KF600635, KF600638 


Al-Hasa 9 2013 


1 May 2013 


0.46 


KF600622, KF600639, KF600648, KF600649, KF600654 


Al-Hasa 25 2013 


2 May 2013 


1 


KJ 156866 


Al-Hasa 10 2013 


2 May 2013 


0.32 


KF600614, KF600624, KF600629, KF600636, KF600641, KF600642, 








R r F600646 1CF600653 


Al-Haqa 11 201 3 

ill XXdod 1 1 Z.UX.X 


3 Mav 701 3 


0.9 


TCF600679 KF600636 KE600646 

1\1 \J\J\J\JjLy j 1X1 UUUUJUj XVX UUUUlU 


Al-Hasa 12 2013 


7 May 2013 


1 


KF600627 


Al-Haca 1 ^ 701 ^ 
ni 1 idad i J z,u i J 


7 Mav 7fl1 3 


0.37 


KFfiOOfilfi l«fFfi00637 KF600640 T^FfiOOfiSO T^FfiOOfiSfi 
ivruuuLmj, xvjtuuuu.j/, i\ruuuu^u, xvj7u\j\juj\j> jx.jruuut_)jtj 


Frnnre TTAF 701 3 


7 Mav 701 3 


0.99 


KF74S068 

1\1 / T: ^ V7VJ (J 


Al-Hasa 14 7013 

Al 1 Xafitl L 4: Z.U 1 J 


o May 701 3 
o Lvid.y zuu 


0.75 


ICFfiOOfilS T«fFfi00fi43 


Al Hasa 1 7013 


9 Mav 701 3 






Al-Hasa 77 7013 

111 XXdod Z,Z, Z.W±^/ 


9 Mav 701 3 


0.47 


TCF600617 KFfin0fi19 KT600671 TCFfS00/S7S KT600631 KTK00633 

XVI UvUU 1 / j 1V1 UL/L/LJ 1 y j XVX UUUU^l) XVI UUUUiLJ) XVX UV/UUJ 1 j XVX UUUUJJ 


Al-Hasa IS 7f)l 3 


1 1 Mav 701 3 


\ 


1V.1 V >V A A ' ! _ ' 


Al-Hasa 16 7013 

AX 1 XtloCL L w Z,U ± J 


1 7 Mav 701 3 


I 


KT600644 

Ivl V > V / V / V ' I I 


Al-Hasa 73 701 3 

Al l laid A, J Z.U 1 J 


1 3 Mav 701 3 

1 J LVld-V ZU1J 


0.76 


1CT1S6860 Tv'TI S^SQ4 KT1S6979 TniS6973 K"T1Sfi8fi7 

XV J 1 JUOUU, XV J 1 XV J U\jy Z.y j JVJXJU:7Z,J, IV J X JUOtJZ. 


RuraiHah 1 7013 

L) Lt 1 dXUdXX ± Z. W X .J 


1 3 Mav 701 3 


I 


TCF600630 

xvi v 'V/v/v K'vy 


Al-Hasa 17 2013 


15 May 2013 


I 


KF600647 


Al-Hasa_19_2013 


23 May 2013 


1 


KF600632 


Al-Hasa 18 2013 


23 May 2013 


1 


KF600651 


Al-Hasa 21 2013 


30 May 2013 


1 


KF600634 


Hafr-Al-Batin 1 2013 


4 June 2013 


1 


KF600628 


Taif_l_2013 


12 June 2013 


1 


KJ 156949 


Wadi-Ad-Dawasir_l_2013 


12 June 2013 


1 


KJ 156881 


Taif_2_2013 


12 June 2013 


0.94 


KJ156896, KJ156876 


Taif_3_2013 


13 Jun 2013 


0.62 


KJ156938, KJ156897, KJ156922, KJ156868, KJ156921, KJ156915, 








KJ 156906 


Taif 4 2013 


13 June 2013 


0.27 


KJ156886, KJ156871 


Al-Hasa_26_2013 


18 June 2013 


0.99 


KJ156882, KJ156941, KJ156872 


Al-Hasa_27_2013 


19 June 2013 


0.94 


KJ156943, KJ156939 


Al-Hasa 28 2013 


22 June 2013 


0.71 


KJ156887, KJ156940, KJ156889, KJ156893, KJ156884, KJ156930, 








KJ156928, KJ156909 


Riyadh 5 2013 


2 July 2013 


1 


KJ 156944 


Riyadh_6_2013 


2 July 2013 


0.73 


KJ156879, KJ156947, KJ156890, KJ156908, KJ156927 


Asir 1 2013 


2 July 2013 


0.44 


KJ156948, KJ156925, KJ156903, KJ156883 


Riyadh_7_2013 


15 July 2013 


0.97 


KJ156937, KJ156905 


Riyadh 9 2013 


17 July 2013 


1 


KJ156869 


Riyadh 8 2013 


17 July 2013 


0.99 


KJ156880, KJ156942 


Hafr-Al-Batin_2_2013 


5 August 2013 


1 


KJ156910 


Riyadh_10_2013 


5 August 2013 


0.95 


KJ156891, KJ156936, KJ156907 


Asir_2_2013 


5 August 2013 


0.65 


KJ156863, KJ156899, KJ156912, KJ156900, KJ156898, KJ156945, 








KJ156932 


Riyadh_ll_2013 


6 August 2013 


0.94 


KJ156946, KJ156911 


Riyadh_12_2013 


8 August 2013 


0.95 


KJ156926, KJ156901 


Riyadh_13_2013 


13 August 2013 


0.97 


KJ156888, KJ156873 


Riyadh_14_2013 


15 August 2013 


1 


KJ 156934 


Riyadh_15_2013 


19 August 2013 


0.49 


KJ156914, KJ156877, KJ156878, KJ156859, KJ156933, KJ156953 


Hafr-Al-Batin_5_2013 


25 August 2013 


0.63 


KJ156951, KJ156924, KJ156954, KJ156913 



(Continued on following page) 



8 mBio' mbio.asm.org January/February 2014 Volume 5 Issue 1 e01062-13 



Spread, Circulation, and Evolution of MERS-CoV 



TABLE 1 (Continued) 





Sample collection 


Genome 




Cjcnomc 




fraction 0 


(jenoanK accession number or source 


Hafr-Al-Batin_4_2013 


25 August 2013 


0.52 


KJ 156931, KJ156895, KJ156864, KJ156861 


Riyadh_17_2013 


26 August 2013 


1 


KJ156918, KJ156920, KJ156865 


Hafr-Al-Batin_6„2013 


28 August 2013 


1 


KJ156874 


Madinah_l_2013 


1 September 2013 


0.3 


KJ156935, KJ156904, KJ156917 


Madinah_3_2013 


11 September 2013 


1 


KJ156950, KJ156916 


Qatar_3_2013 


1 October 2013 


1 


KF961221 


Qatar_4_2013 


1 October 2013 


1 


KF961222 



a Fraction of genome obtained compared with a whole-genome value of 30,119 nucleotides. 



remain the same. Full control of the MERS epidemic requires 
identification of the source of infections to prevent the initiation 
of the observed human-to-human transmission chains. 

MATERIALS AND METHODS 

Sequence generation. Nucleic acid extracts from PCR-confirmed MERS- 
CoV-infected patient samples were processed for reverse transcription 
and PCR amplification as previously described (15). Briefly, nucleic acids 
were extracted from respiratory tract samples (Table 1 ) using automated 
extraction. The MERS-CoV RNA genome was converted to DNA and 
amplified by PCR in 1 5 overlapping amplicons. All amplicons for a sample 
were pooled and processed into Illumina libraries, and sequencing was 
performed with an Illumina MiSeq instrument to generate 2 million to 5 
million 150-nucleotide paired-end reads per sample. The readsets were 
processed to remove primer and adapter sequences by using QUASR (30) 
and assembled into whole genomes using de novo assembly with SPAdes 
(31). The assembly fidelity was verified by monitoring intact open reading 
frames and through comparison with the genome prepared with 
reference-based assembly using SMALT (version 0.5.0) (32), with differ- 
ences resolved by examining the raw read data. 

Phylogenetic methods. The 32 new genomes were aligned with the 33 
published MERS-CoV genomes using MUSCLE (33) implemented in 
MEGA5 (34). Bayesian inference of the phylogeny was performed with 
MrBayes version 3.2.1 (35) using a general-time reversible (GTR) substi- 
tution model with a 4-category discrete approximation of a gamma dis- 
tribution (GTR+T 4 ) to represent among-site heterogeneity. For infer- 
ence of the time-resolved phylogeny, a second, subalignment of 42 
genomes was generated by removing epidemiologically linked sequences. 
Sequences were considered linked if there was epidemiological evidence 
for contact between the patients that was also supported by the viral ge- 
netic data. If the observed number of mutations between the viral ge- 
nomes fell below the 95% upper confidence interval of the Poisson cumu- 
lative distribution function, whose expected value is calculated from the 
evolutionary rate of the virus, the length of the genome, and the length of 
time between the samples, then only the index genome was retained. The 
main coding regions of the genome (encoding ORFlab, S, E, M, and N) 
were concatenated, and a codon-partitioning model of evolution applied 
to the data set. Time-resolved phylogeny was inferred under a codon- 
partitioned HKY+r 4 substitution model (Hasegawa, Kishino, and Yano 
substitution model with a 4-category discrete approximation of a gamma 
distribution), with an uncorrelated lognormal molecular clock and a 
GMRF Bayesian Skyride coalescent model, using a Bayesian Markov- 
chain Monte Carlo (BMCMC) approach implemented in BEAST version 
1.8.0 (36). Ancestral geographical states were coestimated using the 
Bayesian stochastic search variable selection (37). Models employing re- 
versible or nonreversible transition rate matrices were assessed by com- 
paring the marginal likelihood estimator of the BMCMC chains, pro- 
duced through the path-sampling approach implemented in BEAST (38). 
The Bayesian skyline plot, estimating the change in effective population 
size through time, was generated from the BEAST BMCMC output files 
using Tracer version 1.5. Hypothetical ancestral sequences were deter- 
mined using a likelihood-based ancestral reconstruction method imple- 



mented in HYPHY version 2.1.2 (39). Nonsynonymous substitutions 
were determined using custom Python scripts. Codon positions under 
episodic selection (40) were determined using the mixed effects model of 
evolution (MEME) (40) or fast unconstrained Bayesian approximation 
(FUBAR) (41) implemented in HYPHY. 

Nucleotide sequence accession numbers. GenBank accession num- 
bers for the new and previously published genomes are listed in Table 1. 

SUPPLEMENTAL MATERIAL 

Supplemental material for this article may be found at http://mbio.asm.org 
/lookup/suppl/doi: 10.11 28/mBio.0 1062- 1 3/-/DCSupplemental. 
Figure SI, TIF file, 4.2 MB. 
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